fix(workspace): enforce isolation policy, surface degraded mode in TUI#249
Open
windoliver wants to merge 17 commits intomainfrom
Open
fix(workspace): enforce isolation policy, surface degraded mode in TUI#249windoliver wants to merge 17 commits intomainfrom
windoliver wants to merge 17 commits intomainfrom
Conversation
Fixes #208. Workspace provisioning failures in SessionOrchestrator and SpawnManager were silently swallowed, causing agents to run in a shared project root with no operator visibility. Changes: - workspace-provisioner: fully async via execFileAsync (no event-loop block, no shell injection); parallel cleanup with Promise.allSettled - WorkspaceMode tagged union (isolated_worktree | fallback_workspace | bootstrap_failed) + WorkspaceIsolationPolicy (strict | allow-fallback) exported from core/index - session-orchestrator: provisionAgentWorkspace() helper with exhaustive policy-aware control flow; workspaceMode on AgentSessionInfo; default policy strict - spawn-manager: provisionWorkspace() replaces inline execSync git worktree; setIsolationPolicy(); workspaceMode on SpawnResult; default policy allow-fallback (TUI backward compat) - session-service: threads workspaceIsolationPolicy through to orchestrator - spawn-progress: [shared workspace] / [no config] badge + ⚠ reason line for degraded agents; screen-manager threads workspaceMode from SpawnResult - Tests: workspace-bootstrap unit tests (7), policy enforcement tests in orchestrator (5) and spawn-manager (2), retrofitted all existing spawn tests with workspaceMode assertions - E2E: tmux capture-pane test (3 scenarios) verifies badge visible on screen for fallback_workspace, hard failure for strict, clean for isolated_worktree
delegates/feeds/escalates edges now cause the target role's worktree to
branch off the source role's grove branch instead of HEAD.
coder → reviewer (delegates)
coder worktree: grove/<sessionId>/coder (base: HEAD)
reviewer worktree: grove/<sessionId>/reviewer (base: grove/<sessionId>/coder)
This gives reviewers a real git branch relationship — they can run
`git merge grove/<sessionId>/coder` to get the coder's latest commits
without touching the main checkout.
Other edge types (reports, requests, feedback) remain independent (HEAD).
Changes:
- topology: WORKSPACE_BRANCH_EDGES constant, resolveRoleWorkspaceStrategies(),
topologicalSortRoles() — Kahn's algorithm ensures source branches exist
before dependents try to base their worktrees on them
- session-orchestrator: provision workspaces in topological order then spawn
in parallel; passes resolved baseBranch to provisionAgentWorkspace()
- grove-md-builder: annotates edge_type lines with workspace behaviour comment
so GROVE.md is self-documenting
- sqlite: worktree_strategy_json column on sessions table (migration + DDL);
stores resolved strategies at creation time for operator visibility
- session: worktreeStrategies field on Session type
- tests: 13 new unit tests for resolveRoleWorkspaceStrategies and
topologicalSortRoles covering all 6 edge types + chain ordering
SpawnManager now uses resolveRoleWorkspaceStrategies() to pick the correct baseBranch per role based on topology edge types. ScreenManager passes the resolved topology to SpawnManager before spawning begins. Also fixes the branch-naming mismatch: spawn() now uses wsSessionId (stable session-level ID) for provisionWorkspace so the branch computed by resolveRoleWorkspaceStrategies matches the branch actually created for the source role. Validated via 3 tmux capture-pane scenarios: 1. delegates + real git → both isolated_worktree, no badge 2. feedback + real git → both isolated_worktree, independent 3. delegates + no git → both fallback_workspace, [shared workspace] badge
The TUI screen-manager fired all role spawns concurrently, racing coder and reviewer worktree provisioning. With delegates edges the reviewer needs coder's git branch to exist first — concurrent spawns caused reviewer to fail with "git worktree add ... branch not found" and fall back to shared workspace. Fix: import topologicalSortRoles and spawn sequentially — source roles complete before dependents start. Validated via real TUI E2E: 1. grove up → New Session → review-loop → goal → launch 2. Spawn-progress shows: coder spawning → coder started → reviewer spawning → reviewer started 3. git worktree list confirms both worktrees on correct branches 4. Reviewer branch based on coder branch (delegates edge)
SessionOrchestrator hardcoded mcpServePath as join(projectRoot, "src/mcp/serve.ts") which only works when the project IS the grove repo. For any other project (e.g. /tmp/foo), agents got a non-existent MCP server path and couldn't call grove_submit_work, grove_submit_review, or grove_done. Fix: extract resolveMcpServePath() into shared utility that derives the path from process.argv[1] (the running grove CLI entry point), climbing 3 levels to find the grove installation directory. Falls back through dist/mcp/serve.js → src/mcp/serve.ts → import.meta.url variants. DRYs up SpawnManager which had the same logic inline in two places. Validated E2E: real claude agent via acpx in /tmp test project successfully called grove_submit_work — contribution CID stored in session_contributions with the correct session link.
…from edge type
Edge type describes communication semantics. Workspace strategy describes
file isolation. These are independent concerns — a reviewer might want a
delegates edge but still work from HEAD.
Before: WORKSPACE_BRANCH_EDGES constant implicitly derived workspace
strategy from edge type (delegates/feeds/escalates → branch_from_source).
No user control, surprise behavior when adding edges.
After: explicit opt-in field on each edge:
edges:
- target: reviewer
edge_type: delegates
workspace: branch_from_source # optional, default: independent
- Removed WORKSPACE_BRANCH_EDGES constant
- resolveRoleWorkspaceStrategies checks edge.workspace, not edge type
- topologicalSortRoles checks edge.workspace for ordering
- Zod schema validates "branch_from_source" | "independent"
- Wire conversion passes workspace through
- review-loop preset updated with explicit workspace: branch_from_source
- Reverted grove-md-builder YAML comments (GROVE.md unchanged)
- grove-md-builder renders workspace field when present
…ediately grove session start exited right after spawning agents — the reviewer never got a chance to receive routed events from the coder's contribution. Fix: SessionOrchestrator.waitForCompletion() polls agent status every 3s and resolves when all agents are idle/stopped or timeout (5min) expires. CLI awaits this before marking session complete and closing DB. Validated: session now reaches "All agents idle — session complete" with coder submitting work via grove_submit_work. Reviewer receives routed events but doesn't call grove_submit_review (agent prompt issue, not infrastructure).
Three bugs prevented the review-loop handoff from completing: 1. Missing .acpxrc.json in orchestrator path bootstrapWorkspace only wrote .mcp.json, but acpx reads .acpxrc.json. Without it, agents had zero MCP tools — couldn't call grove_submit_work or grove_submit_review. Fix: write .acpxrc.json alongside .mcp.json. 2. Duplicate prompts orchestrator.start() sent goals via runtime.send() AFTER spawn() already sent them as initial prompt. Reviewer got the same prompt twice. Fix: remove the redundant send loop — spawn passes goals via AgentConfig.goal. 3. Premature session stop checkAllIdle() stopped the session when both agents went idle, even if no contributions existed yet. Coder goes idle briefly between editing and calling grove_submit_work. Fix: 30s grace period — don't auto-stop until at least one contribution exists or 30s have passed. Validated E2E with real claude agents via acpx: coder → work: "Added greet(name) function" reviewer → review: "LGTM — greet(name) correctly returns..." reviewer → discussion: "[DONE] Approved — implementation is correct"
Edge types simplified to 3 core types: delegates — forward work (A produces, B acts on it) feedback — response (B sends results back to A) monitors — observe-only (A watches B) Legacy aliases (reports, feeds, requests, escalates) kept in schema for backward compat but all presets migrated to use core types only. Routing message now includes source branch and merge instructions so the receiving agent can actually see file changes: "To see the actual file changes, run: git merge grove/<session>/<source>" CLAUDE.md updated with workflow steps: 1. git merge <source_branch> to get their files 2. Review the code 3. Call grove_submit_review with targetCid from notification
Issue A — git commit as work product: grove_submit_work now accepts commitHash (git SHA) instead of requiring CAS-per-file artifacts. Agents just commit and pass the hash. Other agents git fetch + git diff to see changes. Old artifacts path still works for backward compat. Changed: models.ts (commitHash field), contributions.ts (MCP schema), contribute.ts (skip CAS validation when commitHash present), acpx-runtime.ts (system-reminder uses git workflow), workspace-bootstrap.ts (CLAUDE.md teaches git-based workflow) Issue B — broadcast topology mode: Roles can set mode: "broadcast" to notify all other roles without explicit edges. No more N×(N-1) edge definitions for swarms. review-loop preset: coder and reviewer both use mode: "broadcast" instead of explicit delegates/feedback edges. Changed: topology.ts (mode field on role schema + wire type), topology-router.ts (broadcast routes to all roles), grove-md-builder.ts (renders mode), review-loop.ts preset Edge types (delegates/feedback/monitors) kept in schema for backward compat but no longer required when using broadcast mode.
…oundary MCP tools run in child processes (spawned by acpx/claude) with separate EventBus instances. Events published from grove_submit_work in the MCP process never reached the orchestrator's EventBus in the parent process. Fix: SessionOrchestrator polls the SQLite contribution store every 3s, detects new CIDs via a seen-set, and forwards them to downstream agents via runtime.send(). This is the same pattern the TUI uses via SpawnManager.startContributionPolling(). Validated: reviewer log shows the routed contribution message arriving with CID, summary, and source branch — the polling successfully bridges the process boundary.
…gent turns Contributions from fast agents (coder) were arriving in the same acpx session turn as the reviewer's initial prompt — the agent treated them as context instead of an action trigger and went idle. Fix: seed seenCids with pre-existing contributions, delay first poll by 15s so agents process their initial prompt first. Skip EventBus-based contribution forwarding entirely (polling handles it, EventBus can't cross process boundaries anyway). Validated E2E — full coder→reviewer handoff: work(coder): Added greet(name) function review(reviewer): LGTM — clean implementation work(coder): Added greet(name) function (iteration) discussion(reviewer): [DONE] Approved — implementation meets standards
Reviewer now gets the source agent's workspace path in the notification
instead of a git branch name. No git merge needed — the reviewer reads
files directly from the source workspace path:
[grove] New work from coder:
Workspace: /tmp/project/.grove/workspaces/coder-abc/
ACTION REQUIRED: Read the source files at /tmp/.../coder-abc/
The filesystem IS the handoff artifact. Both agents are on the same
machine — the reviewer just reads the coder's directory.
Updated CLAUDE.md and acpx system-reminder to teach agents the
path-based workflow instead of git merge.
pollContributions now checks for [DONE] prefix or context.done=true in contributions. When detected, stops the session with a descriptive reason. This mirrors use-done-detection.ts from the TUI layer. Combined with the CLI's markDone(), the session status in SQLite transitions to "completed" with the proper stop reason.
Coder prompt: git commit → git rev-parse HEAD → grove_submit_work({ commitHash })
Reviewer prompt: read files at Workspace path from notification, not summary
Validated E2E — full coder→reviewer handoff with real file inspection:
1. Coder edits app.js, commits, submits work
2. Polling routes notification with workspace path to reviewer
3. Reviewer reads coder's actual files (Read /path/to/coder/app.js)
4. Reviewer submits review with real code assessment
5. Reviewer calls grove_done → session status=completed
6. Project root untouched, worktrees isolated
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #208.
Summary
SessionOrchestrator.spawnAgent()andSpawnManager.spawn()were silently swallowed — agents ran in the shared project root with no error surfacedcatchwith policy-awareWorkspaceIsolationPolicy(strict|allow-fallback) andWorkspaceModetagged union (isolated_worktree|fallback_workspace|bootstrap_failed)[shared workspace]/[no config]badge + inline⚠ <reason>for degraded agentsChanges
Core
workspace-provisioner: fully async viaexecFileAsync(no event-loop blocking, no shell injection); parallel cleanup withPromise.allSettled;WorkspaceMode+WorkspaceIsolationPolicyexported fromcore/indexsession-orchestrator: newprovisionAgentWorkspace()helper with exhaustive policy-aware control flow;workspaceModeonAgentSessionInfo; default policystrictsession-service: threadsworkspaceIsolationPolicythrough to orchestratorTUI
spawn-manager: replaces inlineexecSync git worktree addwithprovisionWorkspace();setIsolationPolicy();workspaceModeonSpawnResult; defaultallow-fallback(backward compat)spawn-progress:[shared workspace]/[no config]badge +⚠ reasonline for degraded agentsscreen-manager: threadsworkspaceModefromSpawnResultintoAgentSpawnStateTests
workspace-bootstrapunit testssession-orchestratorspawn-managerworkspaceModeassertionsallow-fallback+ non-git dir →● started [shared workspace]+⚠ Command failed: git worktree add ...strict+ non-git dir →✗ failed: Workspace provisioning failed for '...'allow-fallback+ real git repo →● started(no badge — happy path)Full suite: 4898 pass, 0 fail.
Test plan
bun test— 4898 passbun test tests/tui/workspace-isolation-e2e.test.ts— 3 pass (requires tmux)[shared workspace]badge● started